A Generic Approach to Bulk Loading Multidimensional Index Structures

نویسندگان

  • Jochen Van den Bercken
  • Bernhard Seeger
  • Peter Widmayer
چکیده

Recently there has been an increasing interest in supporting bulk operations on multidimensional index structures. Bulk loading refers to the process of creating an initial index structure for a presumably very large data set. In this paper, we present a generic algorithm for bulk loading which is applicable to a broad class of index structures. Our approach differs completely from previous ones for the following reasons. First, sorting multidimensional data according to a predefined global ordering is completely avoided. Instead, our approach is based on the standard routines for splitting and merging pages which are already fully implemented in the corresponding index structure. Second, in contrast to inserting records one by one, our approach is based on the idea of inserting multiple records simultaneously. As an example we demonstrate in this paper how to apply our technique to the R-tree family. For R-trees we show that the I/O performance of our generic algorithm meets the lower bound of external sorting. Empirical results demonstrate that performance improvements are also achieved in practice without sacrificing query performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Evaluation of Generic Bulk Loading Techniques

Bulk loading refers to the process of creating an index from scratch for a given data set. This problem is well understood for B-trees, but so far, non-traditional index structures received modest attention. We are particularly interested in fast generic bulk loading techniques whose implementations only employ a small interface that is satisfied by a broad class of index structures. Generic te...

متن کامل

The Bulk Index Join: A Generic Approach to Processing Non-Equijoins

Efficient join algorithms have been developed for processing different types of non-equijoins like spatial join, band join, temporal join or similarity join. Each of these previously proposed join algorithms is tailor-cut for a specific type of join, and a generalization of these algorithms to other join types is not obvious. We present an efficient algorithm called bulk index join that can be ...

متن کامل

Space-Partitioning-Based Bulk-Loading for the NSP-Tree in Non-ordered Discrete Data Spaces

Properly-designed bulk-loading techniques are more efficient than the conventional tuple-loading method in constructing a multidimensional index tree for a large data set. Although a number of bulkloading algorithms have been proposed in the literature, most of them were designed for continuous data spaces (CDS) and cannot be directly applied to non-ordered discrete data spaces (NDDS). In this ...

متن کامل

Bulk-Loading the ND-Tree in Non-ordered Discrete Data Spaces

Applications demanding multidimensional index structures for performing efficient similarity queries often involve a large amount of data. The conventional tuple-loading approach to building such an index structure for a large data set is inefficient. To overcome the problem, a number of algorithms to bulk-load the index structures, like the Rtree, from scratch for large data sets in continuous...

متن کامل

Efficient Bulk Loading of Large High-Dimensional Indexes

Efficient index construction in multidimensional data spaces is important for many knowledge discovery algorithms, because construction times typically must be amortized by performance gains in query processing. In this paper, we propose a generic bulk loading method which allows the application of user-defined split strategies in the index construction. This approach allows the adaptation of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997